AITopics | tanh function

Collaborating Authors

tanh function

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

298281b9e89197195eb461e68ad20136-Paper-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 07:26:24 GMT

activation function, saturation, sparsity, (14 more...)

Neural Information Processing Systems

Genre: Research Report (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Tanh Works Better with Asymmetry

Neural Information Processing SystemsDec-24-2025, 07:34:40 GMT

Batch Normalization is commonly located in front of activation functions, as proposed by the original paper. Swapping the order, i.e., using Batch Normalization after activation functions, has also been attempted, but its performance is generally not much different from the conventional order when ReLU or a similar activation function is used. However, in the case of bounded activation functions like Tanh, we discovered that the swapped order achieves considerably better performance than the conventional order on various benchmarks and architectures. This paper reports this remarkable phenomenon and closely examines what contributes to this performance improvement. By looking at the output distributions of individual activation functions, not the whole layers, we found that many of them are asymmetrically saturated.

activation function, batch normalization, name change, (8 more...)

Neural Information Processing Systems

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Tanh Works Better With Asymmetry

Neural Information Processing SystemsOct-8-2025, 08:17:25 GMT

Batch Normalization is commonly located in front of activation functions, as proposed by the original paper.

activation function, saturation, sparsity, (15 more...)

Neural Information Processing Systems

Genre: Research Report (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

AlphaGrad: Non-Linear Gradient Normalization Optimizer

Sane, Soham

arXiv.org Machine LearningApr-22-2025

We introduce AlphaGrad, a memory-efficient, conditionally stateless optimizer addressing the memory overhead and hyperparameter complexity of adaptive methods like Adam. AlphaGrad enforces scale invariance via tensor-wise L2 gradient normalization followed by a smooth hyperbolic tangent transformation, $g' = \tanh(\alpha \cdot \tilde{g})$, controlled by a single steepness parameter $\alpha$. Our contributions include: (1) the AlphaGrad algorithm formulation; (2) a formal non-convex convergence analysis guaranteeing stationarity; (3) extensive empirical evaluation on diverse RL benchmarks (DQN, TD3, PPO). Compared to Adam, AlphaGrad demonstrates a highly context-dependent performance profile. While exhibiting instability in off-policy DQN, it provides enhanced training stability with competitive results in TD3 (requiring careful $\alpha$ tuning) and achieves substantially superior performance in on-policy PPO. These results underscore the critical importance of empirical $\alpha$ selection, revealing strong interactions between the optimizer's dynamics and the underlying RL algorithm. AlphaGrad presents a compelling alternative optimizer for memory-constrained scenarios and shows significant promise for on-policy learning regimes where its stability and efficiency advantages can be particularly impactful.

machine learning, natural language, reinforcement learning, (17 more...)

arXiv.org Machine Learning

2504.1602

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Portugal > Braga > Braga (0.04)

Genre:

Research Report > New Finding (1.00)
Instructional Material (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

Add feedback

Tanh Works Better with Asymmetry

Neural Information Processing SystemsOct-10-2024, 16:13:34 GMT

activation function, asymmetry, batch normalization, (4 more...)

Neural Information Processing Systems

Genre: Research Report (0.85)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Adaptive Friction in Deep Learning: Enhancing Optimizers with Sigmoid and Tanh Function

Zheng, Hongye, Wang, Bingxing, Xiao, Minheng, Qin, Honglin, Wu, Zhizhong, Tan, Lianghao

arXiv.org Artificial IntelligenceAug-6-2024

Adaptive optimizers are pivotal in guiding the weight updates of deep neural networks, yet they often face challenges such as poor generalization and oscillation issues. To counter these, we introduce sigSignGrad and tanhSignGrad, two novel optimizers that integrate adaptive friction coefficients based on the Sigmoid and Tanh functions, respectively. These algorithms leverage short-term gradient information, a feature overlooked in traditional Adam variants like diffGrad and AngularGrad, to enhance parameter updates and convergence.Our theoretical analysis demonstrates the wide-ranging adjustment capability of the friction coefficient S, which aligns with targeted parameter update strategies and outperforms existing methods in both optimization trajectory smoothness and convergence rate. Extensive experiments on CIFAR-10, CIFAR-100, and Mini-ImageNet datasets using ResNet50 and ViT architectures confirm the superior performance of our proposed optimizers, showcasing improved accuracy and reduced training time. The innovative approach of integrating adaptive friction coefficients as plug-ins into existing optimizers, exemplified by the sigSignAdamW and sigSignAdamP variants, presents a promising strategy for boosting the optimization performance of established algorithms. The findings of this study contribute to the advancement of optimizer design in deep learning.

algorithm, gradient, optimizer, (15 more...)

arXiv.org Artificial Intelligence

2408.11839

Country:

North America > United States > Ohio (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.87)

Industry: Health & Medicine (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Deep Learning Activation Functions: Fixed-Shape, Parametric, Adaptive, Stochastic, Miscellaneous, Non-Standard, Ensemble

Hammad, M. M.

arXiv.org Artificial IntelligenceJul-14-2024

In the architecture of deep learning models, inspired by biological neurons, activation functions (AFs) play a pivotal role. They significantly influence the performance of artificial neural networks. By modulating the non-linear properties essential for learning complex patterns, AFs are fundamental in both classification and regression tasks. This paper presents a comprehensive review of various types of AFs, including fixed-shape, parametric, adaptive, stochastic/probabilistic, non-standard, and ensemble/combining types. We begin with a systematic taxonomy and detailed classification frameworks that delineates the principal characteristics of AFs and organizes them based on their structural and functional distinctions. Our in-depth analysis covers primary groups such as sigmoid-based, ReLU-based, and ELU-based AFs, discussing their theoretical foundations, mathematical formulations, and specific benefits and limitations in different contexts. We also highlight key attributes of AFs such as output range, monotonicity, and smoothness. Furthermore, we explore miscellaneous AFs that do not conform to these categories but have shown unique advantages in specialized applications. Non-standard AFs are also explored, showcasing cutting-edge variations that challenge traditional paradigms and offer enhanced adaptability and model performance. We examine strategies for combining multiple AFs to leverage complementary properties. The paper concludes with a comparative evaluation of 12 state-of-the-art AFs, using rigorous statistical and experimental methodologies to assess their efficacy. This analysis not only aids practitioners in selecting and designing the most appropriate AFs for their specific deep learning tasks but also encourages continued innovation in AF development within the machine learning community.

activation function, negative input, sigmoid function, (16 more...)

arXiv.org Artificial Intelligence

2407.1109

Country:

Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Maryland (0.04)
Africa > Middle East > Egypt (0.04)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Health & Medicine > Therapeutic Area (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Uncovering the Power and Limitations of the TanH Activation Function in Neural Networks

#artificialintelligenceMar-25-2023, 14:15:13 GMT

The TanH activation function is a commonly used activation function in neural networks. Similar to the Sigmoid function, the TanH function is particularly useful for binary classification tasks. In this post, we'll be exploring the power and limitations of using the TanH activation function in neural networks. We'll look at its unique properties, advantages, and disadvantages, and discuss some use cases where the TanH function is particularly effective. One of the main advantages of using the TanH function is that it's a zero-centered function.

activation function, neural network, tanh function, (9 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

How to Use Activation Functions in Neural Networks

#artificialintelligenceFeb-25-2023, 01:06:35 GMT

In this Python tutorial, we learn about How to Use Activation Functions in Neural Networks. Activation functions play an integral role in neural networks by introducing nonlinearity. This nonlinearity allows neural networks to develop complex representations and functions based on the inputs that would not be possible with a simple linear regression model. Many different nonlinear activation functions have been proposed throughout the history of neural networks. In this post, you will explore three popular ones: sigmoid, tanh, and ReLU.

activation function, gradient, neural network, (14 more...)

#artificialintelligence

Genre: Instructional Material (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.88)

Add feedback

Activation Functions and their purpose: Binary, Linear, ReLU, Sigmoid, Tanh and Softmax

#artificialintelligenceSep-21-2022, 01:21:29 GMT

In the context of a neural network an activation function defines the output of a node/neuron, they could be classified into these categories: Ridge activation functions, Radial activation functions and Folding activation functions. For this article we will be looking at Ridge activation functions. Binary step function is a threshold based activation function meaning that if the input crosses a certain value the neuron is activated and if it goes below that value the neuron is deactivated, this function can be used in tasks of binary classification, This activation function is not suitable at all in the case of non-linearity (most of problem domains). Also, since the network is not differentiable, gradient-based training is not possible. As you can see here our function is directly proportional to the weighted sum of neurons: f(x) x.

activation function, sigmoid function, tanh and softmax, (8 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback